Search CORE

26 research outputs found

Alternating Randomized Block Coordinate Descent

Author: Diakonikolas Jelena
Orecchia Lorenzo
Publication venue
Publication date: 01/01/2018
Field of study

Block-coordinate descent algorithms and alternating minimization methods are fundamental optimization algorithms and an important primitive in large-scale optimization and machine learning. While various block-coordinate-descent-type methods have been studied extensively, only alternating minimization -- which applies to the setting of only two blocks -- is known to have convergence time that scales independently of the least smooth block. A natural question is then: is the setting of two blocks special? We show that the answer is "no" as long as the least smooth block can be optimized exactly -- an assumption that is also needed in the setting of alternating minimization. We do so by introducing a novel algorithm AR-BCD, whose convergence time scales independently of the least smooth (possibly non-smooth) block. The basic algorithm generalizes both alternating minimization and randomized block coordinate (gradient) descent, and we also provide its accelerated version -- AAR-BCD. As a special case of AAR-BCD, we obtain the first nontrivial accelerated alternating minimization algorithm.Comment: Version 1 appeared Proc. ICML'18. v1 -> v2: added remarks about how accelerated alternating minimization follows directly from the results that appeared in ICML'18; no new technical results were needed for thi

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Alternating randomized block coordinate descent

Author: Diakonikolas Jelena
Orecchia Lorenzo
Publication venue
Publication date: 01/01/2018
Field of study

Boston University Institutional Repository (OpenBU)

Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method

Author: Diakonikolas Jelena
Orecchia Lorenzo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)
Publication date: 01/01/2018
Field of study

We provide a novel accelerated first-order method that achieves the asymptotically optimal convergence rate for smooth functions in the first-order oracle model. To this day, Nesterov\u27s Accelerated Gradient Descent (AGD) and variations thereof were the only methods achieving acceleration in this standard blackbox model. In contrast, our algorithm is significantly different from AGD, as it relies on a predictor-corrector approach similar to that used by Mirror-Prox [Nemirovski, 2004] and Extra-Gradient Descent [Korpelevich, 1977] in the solution of convex-concave saddle point problems. For this reason, we dub our algorithm Accelerated Extra-Gradient Descent (AXGD). Its construction is motivated by the discretization of an accelerated continuous-time dynamics [Krichene et al., 2015] using the classical method of implicit Euler discretization. Our analysis explicitly shows the effects of discretization through a conceptually novel primal-dual viewpoint. Moreover, we show that the method is quite general: it attains optimal convergence rates for other classes of objectives (e.g., those with generalized smoothness properties or that are non-smooth and Lipschitz-continuous) using the appropriate choices of step lengths. Finally, we present experiments showing that our algorithm matches the performance of Nesterov\u27s method, while appearing more robust to noise in some cases

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Dagstuhl Research Online Publication Server

Cyclic Coordinate Dual Averaging with Extrapolation

Author: Diakonikolas Jelena
Song Chaobing
Publication venue
Publication date: 08/06/2023
Field of study

Cyclic block coordinate methods are a fundamental class of optimization methods widely used in practice and implemented as part of standard software packages for statistical learning. Nevertheless, their convergence is generally not well understood and so far their good practical performance has not been explained by existing convergence analyses. In this work, we introduce a new block coordinate method that applies to the general class of variational inequality (VI) problems with monotone operators. This class includes composite convex optimization problems and convex-concave min-max optimization problems as special cases and has not been addressed by the existing work. The resulting convergence bounds match the optimal convergence bounds of full gradient methods, but are provided in terms of a novel gradient Lipschitz condition w.r.t.~a Mahalanobis norm. For

m

coordinate blocks, the resulting gradient Lipschitz constant in our bounds is never larger than a factor

\sqrt{m}

compared to the traditional Euclidean Lipschitz constant, while it is possible for it to be much smaller. Further, for the case when the operator in the VI has finite-sum structure, we propose a variance reduced variant of our method which further decreases the per-iteration cost and has better convergence rates in certain regimes. To obtain these results, we use a gradient extrapolation strategy that allows us to view a cyclic collection of block coordinate-wise gradients as one implicit gradient.Comment: 27 pages, 2 figures. Accepted to SIAM Journal on Optimization. Version prior to final copy editin

arXiv.org e-Print Archive

Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise

Author: Diakonikolas Ilias
Diakonikolas Jelena
Kane Daniel M.
Wang Puqian
Zarifis Nikos
Publication venue
Publication date: 28/06/2023
Field of study

We study the problem of PAC learning

\gamma

-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms. Concretely, the sample complexity of the problem is

\widetilde{\Theta}(1/(\gamma^2 \epsilon))

. We start by giving a simple efficient algorithm with sample complexity

\widetilde{O}(1/(\gamma^2 \epsilon^2))

. Our main result is a lower bound for Statistical Query (SQ) algorithms and low-degree polynomial tests suggesting that the quadratic dependence on

1/\epsilon

in the sample complexity is inherent for computationally efficient algorithms. Specifically, our results imply a lower bound of

\widetilde{\Omega}(1/(\gamma^{1/2} \epsilon^2))

on the sample complexity of any efficient SQ learner or low-degree test

arXiv.org e-Print Archive